Absent data generating classifier for imbalanced class sizes

نویسندگان

  • Arash Pourhabib
  • Bani K. Mallick
  • Yu Ding
چکیده

We propose an algorithm for two-class classification problems when the training data are imbalanced. This means the number of training instances in one of the classes is so low that the conventional classification algorithms become ineffective in detecting the minority class. We present a modification of the kernel Fisher discriminant analysis such that the imbalanced nature of the problem is explicitly addressed in the new algorithm formulation. The new algorithm exploits the properties of the existing minority points to learn the effects of other minority data points, had they actually existed. The algorithm proceeds iteratively by employing the learned properties and conditional sampling in such a way that it generates sufficient artificial data points for the minority set, thus enhancing the detection probability of the minority class. Implementing the proposed method on a number of simulated and real data sets, we show that our proposed method performs competitively compared to a set of alternative state-of-the-art imbalanced classification algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Class-imbalanced classifiers for high-dimensional data

A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the minority class prediction. A class-imbalanced classifier typically modifies a ...

متن کامل

Empirical Similarity for Absent Data Generation in Imbalanced Classification

When the training data in a two-class classification problem is overwhelmed by one class, most classification techniques fail to correctly identify the data points belonging to the underrepresented class. We propose Similarity-based Imbalanced Classification (SBIC) that learns patterns in the training data based on an empirical similarity function. To take the imbalanced structure of the traini...

متن کامل

Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data

Learning from imbalanced data, where the number of observations in one class is significantly rarer than in other classes, has gained considerable attention in the data mining community. Most existing literature focuses on binary imbalanced case while multi-class imbalanced learning is barely mentioned. What’s more, most proposed algorithms treated all imbalanced data consistently and aimed to ...

متن کامل

High dimensional classifiers in the imbalanced case

A binary classification problem is imbalanced when the number of samples from the two groups differs. For the high dimensional case, where the number of variables ismuch larger than the number of samples, imbalance leads to a bias in the classification. The independence classifier is studied theoretically and based on the analysis two new classifiers are suggested that can handle any imbalance ...

متن کامل

Fuzzy rough classifiers for class imbalanced multi-instance data

In multi-instance learning, each learning object consists of many descriptive instances. In the corresponding classification problems, each training object is labeled, but its constituent instances are not. The classification objective is to predict the class label of unseen objects. As in traditional single-instance classification, when the class sizes of multi-instance data are imbalanced, cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2015